Inferring spatial relations from textual descriptions of images

نویسندگان

چکیده

Generating an image from its textual description requires both a certain level of language understanding and common sense knowledge about the spatial relations physical entities being described. In this work, we focus on inferring relation between entities, key step in process composing scenes based text. More specifically, given caption containing mention to subject location size bounding box that subject, our goal is predict object mentioned caption. Previous work did not use text information, but manually provided holding object. fact, used evaluation datasets contain annotated ontological triplets no captions, making exercise unrealistic: manual was required; systems leverage richer information captions. Here present system uses full caption, Relations Captions (REC-COCO), dataset derived MS-COCO which allows evaluate inference captions directly. Our experiments show that: (1) it possible infer with respect directly caption; (2) place better than using relation. paves way for that, decide need be depicted their respective sizes, order then generate final image.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Generating Descriptions of Spatial Relations between Objects in Images

We investigate the task of predicting prepositions that can be used to describe the spatial relationships between pairs of objects depicted in images. We explore the extent to which such spatial prepositions can be predicted from (a) language information, (b) visual information, and (c) combinations of the two. In this paper we describe the dataset of object pairs and prepositions we have creat...

متن کامل

Identifying and inferring objects from textual descriptions of scenes from books

Fiction authors rarely provide detailed descriptions of scenes, preferring the reader to fill in the details using their imagination. Therefore, to perform detailed text-to-scene conversion from books, we need to not only identify explicit objects but also infer implicit objects. In this paper, we describe an approach to inferring objects using Wikipedia and WordNet. In our experiments, we are ...

متن کامل

From Images to Sentences via Spatial Relations

This work presents a conceptual framework for representing, manipulating, measuring, and communicating in natural language several ideas about topological (non-metric) spatial locations, object spatial contexts, and user expectations of spatial relationships. It articulates a theory of spatial relations, how they can be represented as fuzzy predicates internally, and how they can be appropriate...

متن کامل

PERSPECTORS Inferring spatial relations from building product models

Building product models are beginning to find their way into AEC practice. They are proving useful for coordinating large multidisciplinary design and construction teams. In an evolving design and construction planning process, building components are added, modified, or deleted from the product model, causing important spatial relationships to emerge. In addition, new criteria can emerge throu...

متن کامل

SPATIAL INFERENCE AND CONSTRAINT SOLVING How to Depict Textual Spatial Descriptions from Internet

Today there are still many applications in the Internet, where the user is given a textual description of a spatial configuration (e.g. chat, e-mail or newsgroups). The user is asked to imagine the scene and to draw inferences. We present a new approach to generate depictions of such scenes. Besides of drawing spatial inferences, this leads to the problem of solving a system of complicated nume...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Pattern Recognition

سال: 2021

ISSN: ['1873-5142', '0031-3203']

DOI: https://doi.org/10.1016/j.patcog.2021.107847